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Abstract — We investigate the problem of allocating energy 
from renewable sources to flexible consumers in electricity 
markets. We assume there is a renewable energy supplier 
that provides energy according to a time-varying (and possibly 
unpredictable) supply process. The plant must serve consumers 
within a specified delay window, and incurs a cost of drawing 
energy from other (possibly non-renewable) sources if its own 
supply is not sufficient to meet the deadlines. We formulate two 
stochastic optimization problems: The first seeks to minimize the 
time average cost of using the other sources (and hence strives 
for the most efficient utilization of the renewable source). The 
second allows the renewable source to dynamically set a price 
for its service, and seeks to maximize the resulting time average 
profit. These problems are solved via the Lyapunov optimization 
technique. Our resulting algorithms do not require knowledge of 
the statistics of the time-varying supply and demand processes 
and are robust to arbitrary sample path variations. 



I. Introduction 

The highly variable and unpredictable nature of some re- 
newable energy sources (such as wind and solar) has been 
a major obstacle to their integration. For example, a recent 
study conducted by Enernex for wind power integration in 
Minnesota [4] indicates that the variability and day-ahead 
forecast errors will result in an additional $2.11 — $4.41 (for 
15% and 25% penetration) per MWh of delivered wind power. 
Along the same lines, the CAISO report [5] predicted that ten 
minute real-time energy prices could increase substantially due 
to wind forecasting errors and identified day-ahead and same- 
day forecasts and modeling as important tasks for integration 
of renewable resources. 

The necessity to offset variability by stand-by generators 
and system backup investments substantially increases the cost 
of renewables. One approach that can mitigate this problem is 
to couple this supply variability to demand side flexibility [1], 
[2], [3]. The renewable power suppliers could sell their energy 
at a lower price to consumers that are willing to wait in a 
queue, given that it will be served to them within a pre-agreed 
deadline. This essentially allows a lower price of renewable 
energy to consumers willing to provide this extra time flexibil- 
ity. The renewable power supplier can now use this flexibility 
to deliver the energy when it is available Q The supplier will 
sometimes, hopefully rarely, be in a situation when a prior 
deadline commitment cannot be matched and will have to 
purchase the extra energy from the energy spot market (or 
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maintain a costly system backup). Papavasiliou and Oren [3] 
introduced this problem and proposed an exact backward 
dynamic programming algorithm and an efficient approximate 
dynamic programming algorithm for the scheduling decisions 
of the renewable energy supplier. 

In this paper we build a similar model and utilize the 
technique of Lyapunov optimization initially developed in 
[10] [11] [12] for dynamic control of queueing systems for 
wireless networks. We show that the queuing model naturally 
fits in the renewable supplier scheduling problem and present 
a simple energy allocation algorithm that does not require 
prior statistical information and is provably close to optimal. 
The proposed framework can be extended to include pricing, 
multiple queues (with different deadlines) and different objec- 
tive functions, building on the general results from [10]. We 
finally evaluate the proposed algorithm on actual CAISO spot 
market and wind energy production data and show substantial 
reduction to the operating costs for the renewable supplier 
compared to a simple greedy algorithm. 

In particular, we consider a single renewable energy plant 
that operates in discrete time with unit timeslots t 6 
{0, 1,2,.. .}, and provides s(t) units of energy on each slot 
t. The s(t) process corresponds to the renewable supply and 
is assumed to be time varying and unpredictable. Since we 
assume no storage, the energy s(i) must either be used or 
wasted. Demands for this energy arrive randomly according to 
a process a(t) (being the amount of energy that is requested 
on slot t). We assume that consumers requesting energy are 
flexible, and can tolerate their energy requests being satisfied 
with some delay. The requests are thus stored in a queue. Every 
slot t, we use all of our supply s(t) to serve the requests in 
the queue in a First-In-First-Out (FIFO) manner. However, this 
may not be enough to meet all of the requests within a timely 
manner, and hence we also decide to purchase an amount of 
energy x(t) from an outside (possibly non-renewable) plant. 
Letting Q(t) represent the total energy requests in our queue 
on slot t, we have the following update equation: 

Q{t + 1) = m&x[Q(t) ~ s(t) - x(t), 0] + a(t) (1) 

The value x(t) is a control decision variable, and incurs a 
cost x{t)~f(t) on slot i, where 7(t) is a process that specifies 
the per-unit-cost of using the outside energy supply on slot 
t. The value of j(t) can represent a current market price 
for guaranteed energy services from (possibly non-renewable) 
sources. As such, the decision to use x{t) units of energy on 
slot t means the outside source agrees to provide this much 
energy at time t+K for some fixed (and small) integer K > 0, 
for the price x(t)j(t). Without loss of generality, we assume 
throughout that K = 0, so that the energy request is removed 
from our queue on the same slot in which we decide to use 
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the outside source. In the actual implementation, requests that 
are served from the outside source can be removed from the 
primary queue Q(t) but must still wait an additional K slots. 

We first look at the problem of choosing x(t) to stabilize 
our queue Q(t) while minimizing the time average of the 
cost x(t)~f(t) and also providing a guarantee on the maximum 
delay D max spent in the queue. If the future values of supply, 
demand, and market price values (s(t), a(£), 7(f)) were known 
in advance, one could in principle make x(t) decisions that 
minimize total time average cost, possibly choosing x(t) = 
for all t if it is possible to meet all demands using only 
the renewable energy s(t). The challenge is to provide an 
efficient algorithm without knowing the future. To this end, 
we first assume the vector process (s(t), a(t), j(t)) is i.i.d. 
over slots but has an unknown probability distribution. Under 
this assumption, we develop an algorithm, parameterized by a 
positive value V, that comes within 0(1 /V) of the minimum 
time average cost required to stabilize the queue, with a worst- 
case delay guarantee that is 0(V). The parameter V can 
be tuned as desired to provide average cost arbitrarily close 
to optimal, with a tradeoff in delay. We further show that 
the same algorithm is provably robust to non-i.i.d. situations, 
and operates efficiently even for arbitrary sample paths for 
(s(t), a(t),j(t)). Finally, we extend the problem to consider 
pricing decisions at the renewable energy source, so that the 
requests a(t) are now influenced by the current prices. In 
this case, we design a related algorithm that maximizes time 
average profit. 

The Lyapunov optimization technique we use [10] [11] [12] 
is related to the primal-dual and fluid-model techniques 
in [13][14][15][16]. The work in [10][11][12] establishes a 
general [0(1/V),0(V)] performance-congestion tradeoff for 
stochastic network optimization problems with i.i.d. (and more 
general ergodic) processes. Recent work in [17][18] provides 
similar results on a sample path basis, without any probabilistic 
assumptions. We apply these results in our current paper. 
Further, we extend the theory by introducing a novel virtual 
queue that turns an average delay constraint of 0(V) (which 
is achievable with the prior analytical techniques) into a worst 
case delay guarantee that is also 0(V). 

It is useful to distinguish the proposed Lyapunov opti- 
mization method that we use in this paper from dynamic 
programming techniques. Dynamic programming can be used 
to solve stronger versions of our problem (such as minimizing 
average cost subject to a delay constraint) see e.g. [3]. How- 
ever, dynamic programming requires more stringent system 
modeling assumptions, has a more complex solution that 
typically requires knowledge of the supply, demand, and mar- 
ket price probabilities, and cannot necessarily adapt if these 
probabilities change and/or if there are unmodeled correlations 
in the actual processes. It involves computation of a value 
function that can be difficult when the state space of the 
system is large, and suffers from a curse of dimensionality 
when applied to large dimensional systems (such as systems 
with many queues). 

In contrast, Lyapunov optimization is relatively simple to 
implement, does not need a-priori statistical knowledge, and is 
robust to non-i.i.d. and non-ergodic behavior. Further, it has no 



curse of dimensionality and hence can be applied just as easily 
in extended formulations that have multiple queues corre- 
sponding to multiple customers requesting different deadlines, 
contrary to dynamic programming [3] which would require 
exponential complexity in the number of users. 

The reason for this efficiency is that Lyapunov optimiza- 
tion relaxes the question that dynamic programming asks: 
Rather than minimizing time average cost subject to a delay 
constraint, it seeks to push time average cost towards the 
more ambitious minimum over all possible algorithms that 
can stabilize the queue (without regard to the delay con- 
straint). It then specifies an explicit bound on the resulting 
queue congestion, which depends on the desired proximity 
to the minimum cost (as defined by the [0(1/V),0(V)] 
performance-congestion tradeoff). However, the resulting time 
average queue congestion (and delay) that is achieved is not 
necessarily the optimal that could be achieved over all possible 
algorithms that yield the same time average performance cost. 

In the next section, we formulate the basic model under the 
assumption that the (s(t), a(i),7(t)) vector is i.i.d. over slots, 
and present the main allocation algorithm. Section [ill] extends 
to the case when the renewable power source can set a price 
for its services. These algorithms are provably robust to non- 
i.i.d. situations and arbitrary sample paths of events, as shown 
in Section [TV] Section [V] presents an experimental evaluation 
of our algorithm on a real six-month data set and shows 
substantial gains over a simple greedy scheduling algorithm. 

II. The Dynamic Allocation Algorithm 

Suppose that the supply process s(t), the request process 
a(t), and the market price process j(t), as described in the 
introduction, form a vector (s(t), a(t),j(t)) that is i.i.d. over 
slots with some unknown probability distribution. We further 
assume the values of s(t), a(t), j(t) are deterministically 
bounded by finite constants s max , a max , Jmax> so that: 

< s(t) < s m ax , < a(t) < a max , < 7(t) < j max Vi (2) 

The queue backlog Q(t) evolves according to (fl}. The 
decision variable x(t) is chosen every slot t in reaction to the 
current (s(t), a(t), 7(i)) (and possibly additional queue state 
information) subject to the constraint < x(t) < x max for 
all t, where x max is a finite upper bound. We assume that 
x m ax > a>max so that it is always possible to stabilize the 
queue Q(t) (and this can be done with one slot delay if we 
choose x(t) — Xmax for all t). Define c as the time average 
cost incurred by our control policy (assuming temporarily that 
our policy yields such a well defined limit): 

We want to find an allocation algorithm that chooses x(t) over 
time to solve: 

Minimize: c (3) 

Subject to: 1) Q < 00 (4) 

2) < x(t) < Xmax V< (5) 

where Q is the time average expected queue backlog, defined: 
QAUmsup^fE^EfQCr)} 
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Define c* as the infimum time average cost associated with 
the above problem, considering all possible ways of choosing 
x(t) over time. The value of c* is an ambitious target because 
the above problem is defined only in terms of a queue stability 
constraint and does not impose any additional delay constraint. 
We shall construct a solution, parameterized by a constant 
V > 0, that satisfies the constraints of the above problem and 
pushes the average cost within 0(1/ V) of the optimal value 
c*. Further, we show that our algorithm has the additional 
property that worst case delay is no more than 0(V). 

A. The Delay-Aware Virtual Queue 

We solve the above problem while also maintaining finite 
worst case delay using the following novel "virtual queue" 
Z(t): Fix a parameter e > 0, to be specified later. Define 
Z(Q) = 0, and define the virtual queue Z(t) for t G 
{0, 1,2,.. .} according to the following update: 

Z(t + 1) - max[Z(t) - s(t) - x(t) + el {Q(t)>0 }, 0] (6) 

where l{Q(t)>o} is an indicator function that is 1 if Q(t) > 0, 
and zero else. The intuition is that Z(f) has the same service 
process as Q(t) (being s(t) + x(t)), but now has an arrival 
process that adds e whenever the actual queue backlog is non- 
empty. This ensures that Z(f) grows if there are requests in the 
Q(t) queue that have not been serviced for a long time. If we 
can control the system to ensure that the queues Q(t) and Z(f) 
have finite upper bounds, then we can ensure all requests are 
served with a worst case delay given in the following lemma0 
Lemma 1: (Worst Case Delay) Suppose the system is con- 
trolled so that Z(t) < Z max and Q(t) < Qmax for all t, for 
some positive constants Z max and Qmax- Then all requests 
are fulfilled with a maximum delay of D max slots, where: 

D max =\ (Qmax ~t~ Zmax)j^\ (7) 

Proof: Consider any slot t for which a(t) > 0. We show 
that the requests a(t) are fulfilled on or before time t + D max . 
Suppose not (we shall reach a contradiction). Then during slots 
r e {t + 1, . . . , t + D max } it must be that Q(r) > (else 
the requests a(t) would have been served before slot r). Thus, 
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{Q(t)>0} 



= 1, and from (|6]l we have that for all r G {t 



1, • • • , t -\~ D max }. 

Z(t+1)>Z(t)-s(t)-x(t)+c 

Summing the above over rG {t + 1, . . . , t + D max } yields: 

t+D max 

Z(t+D max +1)-Z(t+1)>- [s(T)+x(T)]+D max e 

T=t+1 

Rearranging and using the fact that Z(t + 1) > and Z{t + 

D m ax + 1) < Z max yields: 



Ettf+l" W) + X ( T )} > D maxe - Z n 



(8) 



Now note that the requests a(t) are first available for service 
at time t + 1, and are part of the backlog Q(t + 1) (see ([]]))■ 
Because Q(t+l) < Q ma x and because service is FIFO, these 

2 In the case when requests are served by the outside source with an addi- 
tional delay K > 0, then this bound is modified in the actual implementation 

to {(Qmax + Z max )/e\ + K. 



requests a(t) are served on or before time t + D max whenever 
there are at least Q max units of energy served during the 
interval r G {t + 1, . . . , t + D max }. Because we have assumed 
the requests a(t) are not served by time t + D max , it must be 



that£ 



t+D„ 

T=t+1 



[s(t) + x(t)] < Qmax- Using this in © yields: 



Or. 



This implies that D max 
the definition of D max 



^ Dmc 
< (Or. 

in ©. 



Z max )/e, conti-adicting 
□ 



B. Lyapunov Optimization 

Define ®(t)=(Z(t), Q(t)) as the concatenated vector of the 
real and virtual queues. As a scalar measure of the congestion 
in both the Z[t) and Q(t) queues, we define the following 
Lyapunov function: L(®(t))±\[Z(t) 2 + Q(t) 2 }. Define the 
conditional 1-slot Lyapunov drift as follows: 

A(e(t))AE{L(e(t + i))-L(e(t))|e(t)} (9) 

Following the drift-plus-penalty framework of [ 10] [ 1 1] [12], 
our control algorithm is designed to observe the current queue 
states Z(t), Q(t) and the current (s(t), a(i), j(t)) vector, and 
to make a decision x(t) (where < x(t) < x m ax) to minimize 
a bound on the following expression every slot t: 

A(e(t)) + VE{ 7 (t)»(t)|©(t)} 

where V is a positive parameter that will be useful to affect a 
performance-delay tradeoff. We first compute a bound on the 
above drift-plus-penalty expression. 

Lemma 2: (Drift Bound) For any control policy that sat- 
isfies < x(t) < x max for all t, the drift-plus-penalty 
expression for all slots t satisfies: 

A(0(t)) + VE{-y(t)x(t)\®{t)} <B + VE{j{t)x(t)\@(t)} 
+Q(t)E{a(t) - s(t) -x(t)\®(t)} 
+Z(t)E{e-s(t)-x(t)\@(t)} (10) 



where the constant B is defined: 



c) 2 +a: 



B-: 



Proof: See Appendix A. 
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Hlcix[e j (Smax ~r~ 37max) ] 



(ID 

□ 



C. The Dynamic Algorithm 

Minimizing the right-hand-side of the drift-plus-penalty 
bound ([Tol l every slot t leads to the following dynamic 
algorithm: Every slot t, observe Z(t), Q(t), (s(t),a(t),j(t)), 
and choose x(t) according to the following optimization: 

Minimize: x{t)[V^(t) - Q(t) - Z(t)] 



Subject to: < x(t) < x 



max 



Then update the actual and virtual queues Q(t) and Z(t) by (Q~|) 
and ©. The above minimization for the x(t) decision reduces 
to the following simple threshold rule: 



x(t) 



if Q(t) + Z(t) < Vj{t) 

x m ax otherwise 



(12) 
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The above x(t) value drives the queueing updates ([TJ and 
([§}. However, note by the max[-,0] structure of the Q(t) 
update in ((TJ that we may not need to purchase the full x(t) 
units of energy from the outside plant on slot t. Indeed, define 
x{t) as the actual amount purchased from the plant, given by: 



x(t) 



if Q(t) - s(t) > x(t) 



mm[Q(t) — s(i),0] otherwise 



(13) 

Then we have x(t) < x(t) for all t. 

Theorem 1: (Performance Analysis) Suppose x max > 
m&x[a max , e]. If Q(0) = Z(0) = 0, and if the above dynamic 
algorithm is implemented with any fixed e > and V > for 
all t e {0, 1,2,.. .}, then: 

a) The queues Q(t) and Z(t) are deterministically bounded 
by Qmax and Z max every slot t, where: 



^max — 



b) The worst case delay of any request is: 

max \ Ojmax \ 



(14) 



(15) 



t 



c) If the vector (s(t),a(t), j(t)) is i.i.d. over slots, and if the 
e parameter is chosen to satisfy e < max[E {a(i)} , E{s(t)}], 
then for all slots t > the time average cost satisfies: 

t-i 1 t— l 

- E {7Mi(r)} < - Yl E {7(r) 2; (r)} < c* + B/V 

t=0 r=0 

where B is defined in (TTTI) . 

The above theorem demonstrates the [0(1/V), 0(F)] cost- 
delay tradeoff, where time average cost is within B/V of the 
minimum possible time average cost c* required for queue 
stability, and worst case delay is proportional to V/ e. To obtain 
the smallest D max , the e value should be chosen as large as 
possible while still maintaining e < max[E{a(t)} , E{s(f)}]. 
We can choose e = E{a(t)} if this expectation is known. 
Using e = preserves parts (a) and (c) but does not give a 
finite D max . More discussion of the e = case is given in 
Section |V] 



D. Proof of Theorem Q] 

Proof: (Theorem [T] part (a)) We first show that Q(t) < 
Vj max +a max for all t. This is clearly true for t = (because 
Q(0) = 0). Suppose it holds for slot t. We show it also holds 
for slot t + 1. Consider the case when Q(t) < V-f max . Then 
Q(t + 1) < V-f max +a max , because the queue can increase by 
at most a max on any slot (see dynamics (HJ). Thus, the result 
holds in this case. 

Now consider the opposite case when V^ max < Q(t) < 
V"f max + a max . In this case, we have: 

Q(t) + Z(t) > Q(t) > V lmax > V7(t) 

and hence the algorithm will choose x(t) = x max according 
to ( fT2] >. If Q(t) — x max — s(t) > 0, then on slot t we serve 
at least x max units of data. Because arrivals a(t) are at most 
o-max (and a max < x max ), the queue cannot increase on the 
next slot and so Q(t + 1) < Q(t) < Vj max + a max . Finally, 
if Q(t) — x max — s(t) < 0, then by ([]} we have Q{t + 1) = 
a(t) < a max , again being less than or equal to V^ max + 



Therefore, Q(t) < V"f max + a max for all t. The proof that 
Z(t) < V^fmax + e for all t is similar and omitted for brevity. 

□ 

Proof: (Theorem[T]part (b)) This follows immediately from 
Lemma Q] together with part (a). □ 

The proof of Theorem [T] part (c) requires a prelimi- 
nary lemma from [11]. To introduce the lemma, define a 
(s, a, j)-only policy to be one that observes the current vec- 
tor (s(t), a(t), 7(t)) and makes a stationary and randomized 
decision x*(t) based purely on this vector (and independent 
of the queue backlogs or past system history), subject to the 
constraint < x*(t) < x max . 

Lemma 3: (Characterizing Optimality [11]) If the vector 
(s(t), a(t), j(t)) is i.i.d. over slots, then there exists a (s, a, 7)- 
only policy x*(t) that satisfies: 



E{ 7 (*K(i)} = c* 
E{s(t) + x*(t)} > E{a{t)} 



(16) 
(17) 



where c* is the infimum time average cost in the stochas- 
tic optimization problem ©-(O, and the above expectations 
are with respect to the stationary distribution of the vector 
(s(t),a(t),j(t)) and the possibly randomized action x*(t) 
made in reaction to this vector. 

Proof: (Lemma O This follows as a special case of results 
in [11]. □ 

Proof: (Theorem Q] part (c)) We have assumed that e < 
max[E {a(t)} , E {s(f)}]. We first prove the result for the case 
when e < E{a(t)}. On every slot t, the dynamic choice of 
x(t) in (fT2b minimizes the right-hand-side of the drift bound 
( fTOb (given the observed queue sizes ®(t) = (Q(t), Z(t))), over 
all alternative choices x*(t) that satisfy the required bounds 
< x*(t) < x max (including randomized choices for X*(t)). 
Thus, by ( fTOb we have: 

A(®(t)) + VE{ 7 (t)x{t)\®(t)} < 5 + VE{ 7 (t).T*(i)|0(i)} 
+Q(t)E{a(t) -s(t) -x*(t)\®(t)} 
+Z(t)E{e-s(t)-x*(t)\®(t)} 

where x*(t) is any alternative (possibly randomized) decision. 
Plugging the (s, a, 7)-only policy x* (t) from dT6ll-([T7li (known 
to exist by Lemma [3]) into the right hand side of the above in- 
equality for slot t, and noting that this policy makes decisions 
independent of queue backlogs, yields: 

A(®{t)) + VE{j(t)x(t)\®(t)} <B + Vc* (18) 

where we have used the fact that: 

E{a{t) - s(t) - x*(t)\®(t)} 

= E{a(t) - s(t) - x*(t)} < (19) 
E{e-s(t)-x*(t)\®(t)} 

= E{e-s(t) -x*(t)} < (20) 

where ( fl~9T > follows from ([TTT i and the fact that the (s, a, 7)- 
only policy x*(t) is i.i.d. over slots and hence independent 
of queue backlogs ®(t), and (|20l follows from (ITTb together 
with the fact that E{a(t)} > e. 
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Taking expectations of ( fT~8T > and using the law of iterated 
expectations with the definition of A (©(£)) in (O yields: 

E{L(&(t + l))}-E{L(&(t))} + VE{j(t)x(t)} < B + Vc* 

The above holds for all slots t > 0. Summing over t £ 
{0, 1, . . . , M — 1} for some positive integer M yields: 

M-l 

E{L(0(M))}-E{L(0(O))} + ^ yE{ 7 (<)x(i)} < 

t=o 

BM + VMc* 

Using the fact that L(0(O)) = (because all queues are ini- 
tially empty), and that L(&(M)) > (because the Lyapunov 
function is non-negative) and dividing by VM yields: 

i 7 Ef£o 1 iE{7W^)}<c* + J B/y 

This holds for all M > 0, proving the result for the case when 
e < E{a(t)}. 

We have only used the assumption that e < E{a(t)} to 
ensure the inequality d20l ) holds. If e < E{s(t)}, then clearly 
d20l > holds, regardless of the value of E{a(t)}. Thus, the 
result holds whenever e < max[E {a(t)} , E {s(i)}], proving 
the theorem. □ 

III. Pricing for Maximum Profit 

We now extend the problem to consider pricing decisions. 
Instead of a process a(t) that represents requests arriving at 
slot i, we define a process y(t), called the demand state on 
slot t. The demand state captures any properties of the demand 
that may affect requests for the renewable energy source in 
reaction to the price advertised on slot t. A simple example 
is when y(t) can take one of two possible values, such as 
HIGH and LOW, representing different demand conditions 
(such as during peak times or non-peak times for requesting 
energy). Another example is when y{t) represents the number 
of consumers willing to purchase renewable energy on slot t. 
We assume the demand state y{t) is known at the beginning 
of each slot t (we show a particular case where y(t) does not 
need to be known after our algorithm is stated). 

Every slot t, in addition to choosing the amount of energy 
x(t) purchased from outside sources, the renewable energy 
plant makes a binary decision b(t) £ {0, 1}, where b(t) = 1 
represents a willingness to accept new requests on slot t, and 
bit) = means no requests will be accepted. If b(t) = 1 
is chosen, the plant also chooses a per-unit-energy price p(t) 
within an interval < p(t) < p m ax, where p ma x is a pre- 
established maximum price. The arriving requests a(t) are then 
influenced by the current price p(t), the current market price 
7(i), and the current demand state y(t), according to a general 
demand function F(p,y,j). Specifically, the values of a(t) are 
assumed to be conditionally i.i.d. over all slots with the same 
p(t), y(t), 7(t), and satisfy: 

E {a(t)\p(t),y{t),~/{t),b(t) = 1} = F(p(t), y(t), 7 (t)) 

We assume the function F(p, y, 7) is continuous in p for each 
given y and 70 We further assume the arrivals a(t) continue to 

3 This continuity is only used to ensure the resulting min-drift decision has 
a well defined minimizing price p(t) every slot. 



be worst-case bounded by a max , regardless of p(t), y{t), 7(i). 
The queue iteration Q(t) still operates according to (Q~|), with 
the understanding that a(t) is now influenced by the pricing 
decisions. Let 4>(t) represent the instantaneous profit earned 
on slot t, defined as: 

4>{t) = b(t)p(t)a(t) - l{t)x{t) 

We now consider the following problem: 

Maximize: <fi (21) 

Subject to: 1) Q < 00 (22) 

2) < x{t) < x max yt (23) 

3) b(t) £ {0, 1} , < p{t) < Pmax Vt (24) 

where <\> is defined as the limiting time average profit: 

£4lim t ->oo tEt=o E {^)} 

To solve the problem, we use the same queueing structure 
for Q(t) in ([]]) and the same virtual queue structure for 
Z(t) in ([6j, and use the same Lyapunov function L(&(t)) 
as defined before (recall that &(t) is defined as the vector 
(Q(t), Z{t))). However, we now consider the "penalty" —<fi(t), 
and so the drift-plus-penalty technique seeks to choose a vector 
that minimizes a bound on: 

A(@(t))-VE{ct>(t)\®(t)} 

Using the same analysis as Lemma |2] we can show the 
following bound on this drift-plus-penalty expression: 

A(0(t)) - VE{cf)(t)\&(t)} < B 
-VE {b(t) P (t)F(p(t), y(t), 7 (t)) ~ l(t)x(t)\&(t)} 
+Q(t)E{b(t)F(p(t),y(t)Mt)) - s(t) - x(t)\&(t)} 

+Z(t)E{e- s(t) -x(t)\®(t)} (25) 

Our joint energy-allocation and pricing algorithm observes 
the current system state on each slot i, and chooses b(t), 
p(t), and x(t) to minimize the right-hand side of the above 
drift expression (given the observed ©(£)). This reduces to 
the following: Every slot t, observe queues Q(t), Z(t), and 
observe s(i), j(t), y(t). Then choose a price p(t) and an 
allocation x(t) as follows: 

« (Pricing p(t)) Choose p(t) as the solution to: 

Max: F(p(t),y(t), 7 (t))(Vp(t) ~ Q(t)) 
S.t.: < p(t) < p max 

If the resulting maximum value is non-negative, choose 
b(t) = 1. Else choose b(t) = so that no new requests 
are allowed on slot t. 

• (Allocating x(t)) Choose x(t) according to ( fl2b . 

• (Queue Updates) Update Q(t) and Z(t) by (Q} and ©. 

This pricing pricing policy does not need to know the de- 
mand state y(t) in the special case when F(p(t), y(t), j(t)) = 
y(t)F(p(t) 1 7 (£)), so that demand state simply scales the 
demand function. This pricing structure is similar to that 
considered in [19] for wireless service providers. 



6 



A. Defining Optimality 

We define a (s, y, j)-only policy as one that jointly chooses 
x*(t), b*(t), p*(t) subject to < x*(t) < x max , b*(t) E 
{0,1}, < p*(t) < p m ax according to a stationary and 
randomized decision that depends only on s(t), y(t), "f(t). As 
in [11], it can be shown that the supremum time average profit 
(j>* associated with the problem (l2TTi-(l24li can be achieved 
over the class of (s, y, 7)-only policies. Thus, there exists a 
(s, y, 7) -only policy x*(t), b*(t), p*(t) that satisfies: 

E{b*(t)p*(t)a*(t) -7(t)x*(t)} = <p* (26) 
E{a*(t)-s(t)-x*(t)}<0 (27) 

where a* (t) represents the random requests on slot t associated 
with pricing decisions b*(t), p*(t) and under the random 
demand state y(t) and the random market price -f(t). It is 
useful to define a*=E{a*(t)}. In the case when the policy 
p*{t), b*(t), x*(t) that satisfies (f26]l-([27]l is not unique, we 
define a* as the maximum value such that there exists an 
(s, y, 7) -only policy that satisfies (l26b-(t2Tb. 

B. The Joint Pricing and Allocation Algorithm 

Theorem 2: Assume that x max > m.ax[a mox , e], and that 
Q(0) = Z(0) — 0. If the above joint pricing and allocation 
policy is implemented every slot with fixed parameters e > 0, 
V > 0, then: 

a) The worst case delay D rnax and backlog Q m ax are 
the same as before (given in ([15), <0), where Q m ax is 
proportional to V and D max is proportional to V/ e. 

b) If the vector (s(t), y(t), "f(t)) is i.i.d. over slots, and if 
e < max[a*, E {s(i)}] (where a* is defined in Section Pi- At . 
then0 

!E*=oEWr)}>r-S/V Vt>0 

where B is defined in ( fTTT i. and </>* is the optimal time average 
profit that can be achieved by any algorithm that satisfies the 
constraints of the problem (f2Tb-(f24b. 

Proof: See Appendix C. □ 

IV. Non-I.I.D. Models 

Here we extend the analysis to treat non-i.i.d. models. For 
brevity, we consider only the problem of Section [TT] that seeks 
to allocate x(t) without regard to pricinglf] Specifically, we 
assume that the processes s(i), a(t), j(t) vary randomly over 
slots according to any probability model (with arbitrary time 
correlations). However, we continue to assume the sample 
paths are bounded so that < s(t) < s max , < a(t) < a max , 
< j(t) < Jmax for all t- We show that the same algorithm 
of Section [II] which allocates x{t) according to dT2b . still 
provides efficient performance in this context. We assume that 

4 Note that actual profit can be defined (j>(t)=b{t)p(i)a(t) — j(t)x (t), with 
x(t) defined in {T5J. Clearly <f>(t) > <j>(t) for all t, and so the time average 
of the actual profit (j>(t) is even closer to the optimal value <j>* . 

5 Similar analysis can be applied to the pricing problem for this non-i.i.d. 
case, using the technique in [18] that incorporates the random demand a(t) 
with expectation F(p(t), y(t), j{t)), where the y(t) and 7(4) processes are 
arbitrary sample paths. 



Q(0) = Z(0) = 0, and that fixed parameters V > and e > 
are used. We continue to assume that x max > max[a max , e]. 

We first observe that the exact same worst case backlog and 
delay bounds Q max and D max given in (TBi l and ( fTBI l hold in 
this non-i.i.d. case. Thus, worst case delay is still bounded 
by a constant that is proportional to V/e. This is because the 
proof of this bound in Theorem [T] (a) and (b) was a sample 
path proof that did not make use of the i.i.d. assumptions. 
Indeed, it used only the fact that < a(t) < a ma x for all t. 

It remains only to understand the efficiency of the time av- 
erage cost. To this end, we use the T-slot lookahead metric as 
defined in the universal scheduling work [17] [20]. Specifically, 
suppose that the sample path of (s(t),a(t),j(t)) is chosen at 
time for all t according to some arbitrary values. For a given 
positive integer T and a positive integer R, we consider the 
first RT slots, composed of R successive "frames" of size T. 
For each frame r G {0, 1, . . . , R — 1}, we define c* as the 
optimum solution to the following "ideal" problem that uses 
full knowledge of (s(t), a(t),j(t)) over the frame: 

Minimize: c*A± ££^ T_1 l(t)x{t) (28) 

Subject to: 1) Y^^^W) + - a ( T )} > (29) 

2) EB T_1 [»W + X ( T ) " £ ] ^ ( 30 ) 
3)0 < x(r) < x max VT e {rT, . . . , (r + 1)T - 1}(31) 

Thus, c* is the optimal cost that can be achieved over frame 
r, considering all possible ways of allocating x(t) over 
this frame using perfect knowledge of the future values of 
(s(t), a(r), 7(1")) over this frame, subject to ensuring the total 
energy provided over the frame is at least as much as the total 
sum arrivals, and is also at least eT. 

Theorem 3: (Universal Scheduling) Under the above as- 
sumptions, the worst case backlog and delay are given by 
Qmax and D max in ( TBi l and Sl5[ . Further, for all positive 
integers T and R, we have: 

&E^- 1 7(TMr)<iES«+^ 

where B is defined in (TPT1) . 

Proof: The proof combines the techniques of the proof of 
Theorem Q] with the universal scheduling results in [17][20], 
and is given in Appendix B. □ 

The above result says that the achieved time average cost 
over any interval of RT slots is less than or equal to the 
average of the c* values, plus a "fudge factor" of at most 
BT/V. While the average of the c* values is not the same 
as the minimum cost that could be achieved with perfect 
knowledge of the future over the full RT slots, this result 
is still interesting because the c* values are still obtained by 
ideal algorithms implemented over T slot frames with full 
knowledge of the future events in these frames. 

V. Experimental Evaluation 

We evaluated the performance of the proposed algorithm on 
a six-month data set that we created by combining 10-minute 
average spot market prices 7(i) for Los Angeles area (LAI) 
from CAISO [21] and 10-minute energy production s(t) for a 
small subset of windfarms from the Western Wind resources 



7 



Dataset published by the National Renewable Energy Labora- 
tory [22]. We modeled the demand a(t) as i.i.d. over slots and 
uniformly distributed over the integers {0, 1, . . . , a max }. We 
executed the proposed Lyapunov drift optimization algorithm 
in 10-minute timeslots and experimented with different values 
of the parameters V, e and the corresponding deadlines they 
generate. 

We compare the proposed algorithm against a simple greedy 
strategy "Purchase at deadline," which tries to use all the 
available resource s(t) and only buys from the spot market 
as a last resort if a deadline is reached. As can be seen in Fig. 
[1] the proposed algorithm reduces the cost of the renewable 
supplier by approximately a factor of 2 in the tested six-month 
window. The slope of the two lines is different, suggesting that 
the savings are unbounded as the time increases. This is not 
surprising since the greedy strategy does not hedge for future 
high prices in the spot market while the proposed algorithm 
learns to proactively buy when the spot market prices are lower 
than typical and deadline violations seem probable. The high 
variability of the spot market prices [21] makes this advantage 
significant. The second observation, seen in Fig. [2] is that the 
proposed algorithm has on average a much smaller delay than 
the deadline, which for our parameters was D max = 70 hours. 
On the contrary, the greedy algorithm makes many requests 
wait close to (or exactly at) the maximum allowed 70 hours. 

Our results use e = E {a(t)} — a max /2. We also conducted 
simulations with e = 0, which does not require knowledge 
of E{a(f)}. While e = does not provide a finite delay 
guarantee, it still guarantees the same finite Q max - Together 
with FIFO service, this means that the worst case delay for 
requests that arrive at time t is given by the smallest integer 
T > such that 2~^t=I+i S ( T ) — Qmax- While there is no 
bound on this for general s(t) processes, it can still lead to 
small delays. Indeed, in the simulations it still maintained 
all delays under D max = 2.9 days (having a maximum 
experimental delay of 14 hours, as compared to 9.5 hours for 
the e = E{a(i)} case)@ Fig. Q] shows it gives slightly better 
cost, particularly because it increases delay. Both Lyapunov 
optimization algorithms provided significantly better cost and 
delay as compared to the greedy algorithm. It should be 
noted that we did not compare against dynamic programming 
algorithms such as the one proposed in [3]. While it is 
clear that a dynamic programming approach could solve this 
problem optimally if the statistics of the underlying processes 
were known, one benefit of our approach is that no such prior 
knowledge is required. Further, the Lyapunov approach yields 
an efficient algorithm for multiple queues corresponding to 
different customers with different deadlines. 

We now present some further experimental results investi- 
gating the influence of varying V and e in the performance 
of the proposed algorithm. For these simulations we used the 
same data set as the previous part. For the first experiment, the 
performance of the algorithm for different values of parameter 
V is compared. The rest of the parameters are unchanged and 
are a max = 175, j max = 180, x max = 400, s max = 90, and 
e = 87.5. The result is shown in Fig. [3] As expected, the cost 

6 For legibility, the delay data for the e = case is not shown in Fig. ff] 
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Fig. 1. Cost of the renewable energy supplier for energy purchased at the 
spot market. For the proposed algorithm we used the parameters a ma x = 
175, 7max = 180, x m ax = 400, V = 100, D max = 415 = 2.9 days. 
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Fig. 2. Histogram of delay for the customers waiting in the service queues 
of the renewable energy supplier under the two algorithms (vertical axis in 
logarithmic scale). Case e = is not shown, but has max delay 14 hours, as 
compared with the e = E {A(t)} case (shown) with max delay 9.5 hours. 

decreases with V. The tradeoff is in the maximum waiting 
time of the packets. The maximum waiting times observed 
in the simulations for parameter V being 20, 50, 100, 200 
are 3.5, 5.8, 10.2, 15.2 hours, respectively!] For the second 
experiment, we consider the performance of our algorithm 
for different values of e. Here, we fixed the value V = 100 
and run the simulation for e = {87.5, 60, 35, 10}. The cost 
decreases as e decreases, as shown in Fig. |4] However, the 
maximum observed waiting times increase with e. So for 
e = 87.5, 60, 35, 10, the maximum observed waiting times are 
9.5, 11.7, 12.5, 13.7 hours, respectively. Overall, as expected, 
the cost gets better as V is increased, with a tradeoff in 
waiting time. Further, the waiting time reduces as e increases 
to E{A(t)}, although waiting times are still reasonable even 
with e = 0, which is useful when E{A(i)} is unknown. For 
non-i.i.d. situations, using a smaller value of e may also reduce 
cost due to the fact that this relaxes the constraint (f30b . 

VI. Conclusions 

This work presents a Lyapunov optimization approach to 
the problem of efficient use of renewable energy sources. 

7 The maximum observed waiting time for the simulation run for the V = 
100 case of Fig. ra was 10.2, rather than 9.5 as in the previous simulation 
for the case V = 100. This is because this simulation used independently 
generated a(t) values. 



s 



Lyapunov optimization V = 20 

Lyapunov optimization V = 50 

- - Lyapunov optimization V = 100 
Lyapunov optimization V = 200 




100 
Time (days) 

Fig. 3. Cost of the renewable energy supplier for energy purchased at the 
spot market for different values of V = 20, 50, 100, 200. For the proposed 
algorithm we used the parameters a ma x = 175, = 400, and e = 87.5. 
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Fig. 4. Cost of the renewable energy supplier for energy purchased at the 
spot market for different values of e = 87.5, 60, 35, 10. For the proposed 
algorithm we used the parameters a max = 175, x ma x = 400, and V = 100. 



Efficiency can be improved if consumers are flexible and 
can tolerate their requests being served with some delay. 
Two different problems were presented: One that seeks to 
minimize cost associated with using an outside (possibly 
non-renewable) plant to meet the deadlines, and another that 
seeks to maximize profit by dynamically selecting a price 
for service. Our algorithms are simple and were shown to 
operate efficiently without knowing the statistical properties 
of the supply, demand, and energy request processes. We 
first considered a simple case when these processes are i.i.d. 
over slots but with unknown probabilities. We next treated 
the general case of arbitrary (possibly non-i.i.d. and non- 
ergodic) sample paths. Our analysis also contributes to the 
theory of Lyapunov optimization by introducing a new type 
of virtual queue that guarantees a bounded worst case delay. 
Our algorithms use a parameter V that can be tuned as desired 
to affect a performance-delay tradeoff, where achieved cost 
is within 0(1/ V) from optimal, with a worst case delay 
guarantee that is 0(V). These techniques provide a convenient 
alternative to dynamic programming that leads to a general 
framework for problems that naturally arise in scheduling of 



renewable energy markets. 

Appendix A - Proof of Lemma[2] 
From the Z(t) update rule © we have: 

Z(t + 1) < max[Z(i) - s(t) - x{t) + e, 0] 

and hence: 

Z(t + l) 2 < (Z(t) - s(t) - x(t) + e) 2 



Thus: 



Z{t+l) 2 -Z{tf 



< 



1 



(e - s{t) - x{t)f + Z(t)(e - s(t) - x(t)) 



-2 max[(w ' 



+ x max ) 2 , e 2 ] + Z{t){e - s{t) - x(t)) 
Similarly, by squaring (fl} and using the inequality: 

(max[Q - (it, 0] + a) 2 < Q 2 + p 2 + a 2 + 2Q(a - /i) 
which holds for any Q > 0, /i > 0, a > 0, we obtain: 

Q(t + I) 2 -Q(t) 2 1 

+Q(t)(a(t) - s(t) - x(t)) (32) 

Combining the above yields: 

L{®(t + 1)) - L(®(t)) < B 
+Q(t)(a(t) - s(t) - x(t)) + Z(t)(e - s(t) - x(t)) (33) 

Taking conditional expectations of the above, given &(t), and 
adding VE{j(t)x(t)\&(t)} to both sides proves the result. 

Appendix B - Proof of Theorem [3] 

Again define ®(t) = [Q(t), Z(t)], and define the Lyapunov 
function L(®(t)) the same as before: 



L(®(t))A-[Q(t) 2 



As in [20] [17], for a given integer T > 0, we define the T-slot 
sample path drift At(@(0) as follows: 

A T (®(t))AL(®(t + T))-L(®(t)) 

This differs from our 1-slot conditional drift A(0(t)), used 
for the i.i.d. analysis, because (i) It involves T slots, rather 
than 1 slot, and (ii) It does not use an expectation. 

Now suppose that the values (a(r), s(r), 7(7")) and x(t) 
satisfy the following for all r: 



< a(r) < a 
< 7 (t) < 7 



max 1 
max •> 



< s(t) < s 
< x{t) < x 



max 
max 



(34) 
(35) 



We have the following lemma. 

Lemma 4: Fix any slot t, any queue state ©(<) = 
[Q(t), Z(t)], and any integer T > 0. Consider an arbitrary 
sample path for a(r), s(r), 7(1"), over the interval r G {t, t + 
1, ...,< + T — 1}, assumed only to satisfy d34ll- (l3~5l) . Assume 
that the decisions for x(t) are given by the algorithm (fT~2b . 
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with queue updates for Q(t) and Z(t) given by ([TJ and 
Then: 

t+T-l 

Ar(0(t)) + V J2 t(t)k(t) < 



T = t 

t+T-l 



ST 2 + y ^ 7(r)x*(r) 

r=t 

t+T-l 

+Q(t) X! Kr)- S (r)-x*(r)] 



+Z(t)^[ £ - S (T)-x*(r)] 

T=t 

where x*(t) are any alternative choices that satisfy < 
x* (r) < x max for all r € {i, . . . , t + T — 1}. The constant f? 
is given in (TTTT i. 

Proof: From d33l i we have that for all r: 

L(0(r + 1)) - £(0(r)) < B + Q(r)(a(r) - s(r) - ;c(t)) 

+Z(r)(e- S (T)-x(T)) 

Summing the result over r € {t, . . . , t + T — 1} yields: 

t+T-l 

A T (&(t))<BT + Q(r)(a( T ) - s(t) - x(t)) 

T=t 

t+T-l 

+ Yl Z(r)(e-s(r)-x(r)) 

r=t 

Adding the penalty term to both sides yields: 

t+T-l t+T-l 

A T (e(t)) + V Y i{t)x{t) <BT + V Y 7(r)x(r) 

r=t r=t 
t+T-l 

+ J] Q(r)(a(r)- S (r)-x(r)) 

r=f 

t+T-l 

+ J2 Z(r)(e-s(r)-x(r)) 

T=t 

We now use the fact that for each slot r, the value of x(t) is 
chosen to minimize: 

x(t)[Vj(t)-Q(t)-Z(t)] 

over all x(t) such that < x(r) < x max - It follows that: 



t+T-l 



t+T-l 



T=t 



A T {&(t)) + V i(t)x{t) <BT + V 7( r ) x *( T ) 

T-t 

t+T-l 

Y Q(r)(a(r)- S (r)-x*(r)) 



T = t 

t+T-l 



+ Z(r)(e-s(r)-x*(T)) 

T=t 

where for all r G {t, . . . ,t + T — 1}, x*(t) is any value that 
satisfies < x*(t) < x max . Now note that the maximum 
changes in the Q{t) and Z(t) queues on one slot are given 
by constants Cq and Cz, respectively, defined: 

== ni^[^mas ~l~ X maX j Ctmax] 
= rn^[^max X max , c\ 



Thus: 

\(Q(r) Q(t))(a(r) s(r) - x*(r))\ < C 2 Q (r t) 
\(Z(r)-Z(t))(e-.s(r)-x*(r)\<CUr-t) 

We can thus replace the right hand side of the above drift 
inequality with: 

t+T-l 

A T (0(t)) + V Yl 7(r)x(r)< 



t+T-l 



BT + 

t+T-l 



{Cl + C%)T{T-l) 



-V Y l(r)x*(T)+Q(t) Y (a(r)-s(r)-x*(r)) 

T=t T—t 

t+T-l 

+Z(t) Y (e-s(r)-x*(r)) 



where we have used the fact that $Dr=t 1 ( T — t) = T(T — 
l)/2. However, it is not difficult to show that: 



< B 



and hence: 



BT 



{cl + ci) 



T(T - 1) < BT 2 



This proves the result. □ 
Now fix a frame size T > 0, consider the timeline 
decomposed into R successive frames of size T, and consider 
any frame r G {0, 1, . . . , R — 1}. Define c* as the optimum 
cost in the frame-r problem d28ll-(l3TI). and define x*(t) for 
t G {rT,...,rT + T — 1} as the optimal decisions for 
that problem, which achieve c* and satisfy the inequality 
constraints d29ll-(l3Tll. Then using the drift bound given in 
Lemma @] together with the equalities and inequalities d28l l- 
OTI ), we have: 

rT+T-l 

A T (0(rT)) + V Y 7( r M r ) ^ Bt2 + VTc * 

T=rT 

Summing the above over r G {0, 1, . . . , R — 1}, using the 
definition of Ax(0(£)), and dividing by RTV yields: 

L( & (RT))-L {&m 

RTV tit /—i >\ > y > - 



RT 



BT 1 
— + R 



R-l 



r=0 

Using the fact that L(0(O)) = and L(&(RT)) > yields 
the result. 

Appendix C - Proof of Theorem [2] 

Part (a) follows by noting that the proof of parts (a) and (b) 
in Theorem Q] hold exactly in this new context, as we have not 
changed the queueing dynamics for Q(t) or Z(t) or the fact 
that a(t) < a max for all t. 

We now prove part (b). We have assumed that e < 
max[a*, E {s(t)}]. We first prove the result for the case e < 



a*. On each slot t our dynamic algorithm makes actions 
6(f), pit), x(t) that, given the observed 0(f) = [Q(t),Z(t)], 
minimizes the right hand side of the drift inequality (|25l l over 
all alternative choices. Thus: 

A(0(f)) - VE{<p(t)\®(t)} < B 

-VE{b*(t)p(t)F(p*(t),y(t) 7 j(t))-j(t)x*(t)\&(t)} 

+Q(t)E{b*(t)F(p*(t),y(t)Mt)) - s(t) - z*(f)|0(f)} 

+Z(t)E{e-s(t)-x*(t)\&(t)} (36) 

where b*(t), p*{t), x*(t) are any other choices that satisfy: 

< x*(t) < x max , 0<p*{t) < &*(t)e{o,i} Vf 

We now use the existence of a (s, y, 7)-only policy x*(t), 
6*(f), p*(t) that satisfies the inequalities (|26j - d22J - It is not 
difficult to show that d26ii-(l27ii are equivalent to the following: 

E{b*(t)p*(t)F(p*(t),y(t)Mt))- 7 (t)x*(t)\®(t)} = ct>* (37) 
E{b*(t)F(p*(t),y(t) n (t)) - s(t)-x*(t)\&(t)} < (38) 
E{b*(t)F(p*(t),y(t),~/(t))\®(t)} = a* (39) 

where the above conditional expectations ([37l>-(f39T> given &(t) 
are the same as the unconditional expectations, because the 
(s, y, 7)-only policy does not depend on the queue states 
&(t) (recall that (s(t), y(t), 7(f)) is i.i.d. over slots and hence 
independent of queue states). Plugging (|3~7l>-([39]l directly into 
the right hand side of ( |36T > yields: 

A(0(f)) -VE{d)(t)\&(t)} < B-V<f)* +Z{t)(e~a*) (40) 

Because we have assumed that e < a*, this reduces to: 

A(&(t))-VE{<l)(t)\®(t)}<B-V<t)* (41) 

Taking expectations of the above (with respect to the random 
&(t)) and using the law of iterated expectations gives: 

E {£(©(f + 1))} - E {L(&{t))} - VE {(f>{t)} <B-V(j>* 

The above holds for all slots t. Summing over r G 
{0, . . . , M — 1} for some integer M > yields: 

E {L(&(M))} - E {£(0(0))} - V E < 

M(B - V(j>*) 

Dividing by VM and using the fact that E{L(0(O))} = 
and E{L(&(M))} > yields: 

^ M-l 

This holds for all M > 0, proving the result for the case 
e < a*. 

We have used the fact that e < a* only in showing the 
Z(t ) (e— a* ) term on the right hand side of d40b can be removed 
while preserving the inequality. However, suppose that e < 
E{s(f)}. Then the Z(t)E{e - s(f) - x*(f)|0(f)} term in the 
right hand side of d36*l > can immediately be removed (recall that 
x*(t) > and E{s(f)} = E {s(f)|0(f)} because s(f) is i.i.d. 
over slots and hence independent of current queue backlog). 
This leads directly to fiH regardless of the value of a*. Thus, 
the result holds whenever e < max[a*, E {s(f)}], proving the 
theorem. 
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